43 research outputs found
Laplacian Change Point Detection for Dynamic Graphs
Dynamic and temporal graphs are rich data structures that are used to model
complex relationships between entities over time. In particular, anomaly
detection in temporal graphs is crucial for many real world applications such
as intrusion identification in network systems, detection of ecosystem
disturbances and detection of epidemic outbreaks. In this paper, we focus on
change point detection in dynamic graphs and address two main challenges
associated with this problem: I) how to compare graph snapshots across time,
II) how to capture temporal dependencies. To solve the above challenges, we
propose Laplacian Anomaly Detection (LAD) which uses the spectrum of the
Laplacian matrix of the graph structure at each snapshot to obtain low
dimensional embeddings. LAD explicitly models short term and long term
dependencies by applying two sliding windows. In synthetic experiments, LAD
outperforms the state-of-the-art method. We also evaluate our method on three
real dynamic networks: UCI message network, US senate co-sponsorship network
and Canadian bill voting network. In all three datasets, we demonstrate that
our method can more effectively identify anomalous time points according to
significant real world events.Comment: in KDD 2020, 10 page
Active Keyword Selection to Track Evolving Topics on Twitter
How can we study social interactions on evolving topics at a mass scale? Over
the past decade, researchers from diverse fields such as economics, political
science, and public health have often done this by querying Twitter's public
API endpoints with hand-picked topical keywords to search or stream
discussions. However, despite the API's accessibility, it remains difficult to
select and update keywords to collect high-quality data relevant to topics of
interest. In this paper, we propose an active learning method for rapidly
refining query keywords to increase both the yielded topic relevance and
dataset size. We leverage a large open-source COVID-19 Twitter dataset to
illustrate the applicability of our method in tracking Tweets around the key
sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method
achieves an average topic-related keyword recall 2x higher than baselines. We
open-source our code along with a web interface for keyword selection to make
data collection from Twitter more systematic for researchers.Comment: 10 pages, 3 figure
Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4
Misinformation poses a critical societal challenge, and current approaches
have yet to produce an effective solution. We propose focusing on
generalization, soft classification, and leveraging recent large language
models to create more practical tools in contexts where perfect predictions
remain unattainable. We begin by demonstrating that GPT-4 and other language
models can outperform existing methods in the literature. Next, we explore
their generalization, revealing that GPT-4 and RoBERTa-large exhibit critical
differences in failure modes, which offer potential for significant performance
improvements. Finally, we show that these models can be employed in soft
classification frameworks to better quantify uncertainty. We find that models
with inferior hard classification results can achieve superior soft
classification performance. Overall, this research lays groundwork for future
tools that can drive real-world progress on misinformation
ToxBuster: In-game Chat Toxicity Buster with BERT
Detecting toxicity in online spaces is challenging and an ever more pressing
problem given the increase in social media and gaming consumption. We introduce
ToxBuster, a simple and scalable model trained on a relatively large dataset of
194k lines of game chat from Rainbow Six Siege and For Honor, carefully
annotated for different kinds of toxicity. Compared to the existing
state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57)
in recall. This improvement is obtained by leveraging past chat history and
metadata. We also study the implication towards real-time and post-game
moderation as well as the model transferability from one game to another.Comment: 11 pages, 3 figure